In the following few cells we are finding the locations which have more than 90% of the values missing in new_test_per_thousand column

Below we have done the missing data imputation using mean and mode values-

Outlier removal using isolation forest algorithm-

XGBoost algorithm-

Support vector machine

Lasso Regression implementation-

LightGBM